Multi-Agent Counterfactual Regret Minimization for Partial-Information Collaborative Games
نویسندگان
چکیده
We study the generalization of counterfactual regret minimization (CFR) to partialinformation collaborative games with more than 2 players. For instance, many 4-player card games are structured as 2v2 games, with each player only knowing the contents of their own hand. To study this setting, we propose a multi-agent collaborative version of Kuhn Poker. We observe that a straightforward application of CFR to this setting can lead to sub-optimal results, and explore extensions to CFR that can offer improved performance. Counterfactual Regret Minimization (CFR) is an iterative learning approach for multi-agent adversarial partial-information games. The goal of CFR is to iteratively minimize a regret bound, called counterfactual regret, on the utility of different actions. Since counterfactual regret is an upper bound on the true regret, CFR also minimizes true regret. For two player zero-sum games (e.g., head’s up poker), CFR therefore converges to the unique Nash equilibrium (Zinkevich et al. [2008]). Recently, Moravčík et al. [2017] combined CFR with state-space compression via deep learning and showed this to be effective in beating human players at 2-player no limit Texas Hold’em poker. In this paper, we study the generalization of CFR to partial-information games with more than 2 players. In such games, the player dynamics can be much richer, e.g. an optimal strategy might require players to both collaborate and compete. For instance, many 4-player card games are structured as 2v2 games, with each player only knowing the contents of their own hand. Canonical examples include Spades, Euchre, and Bridge. In general, it is not known whether Nash equilibria strategies exist in these games, or whether CFR can converge to good solutions. To study this question, we developed a collaborative extension of Kuhn Poker, which can be viewed as arguably the simplest game in this setting that admits interesting strategic behavior. We compare tabular CFR with extensions where 1) agents attempt to maximize their team’s expected utility and 2) players are given information about their partner’s hands through a noisy channel, to simulate how adding a state inference model could help agents play. Related reinforcement learning approaches were studied in Foerster et al. [2017], which found collaborative strategies can be learned by explicitly including the response of other players into policy updates. Our contributions are as follows. We show that when applying CFR in the 4-player setting: • Basic CFR learns bad strategies, where allies are antagonistic, rather than collaborative. • Using collective rewards yields strategies that dominate those found with selfish rewards. • We analyze the performance of CFR with an additional state inference oracle that reveals hidden state information. We find that players with a perfect oracle learn a strategy that dominates baseline players without oracle. • However, we find that CFR is not robust: close-to-perfect oracles significantly degrade the quality of learned strategies, making them worse than strategies that do not use an oracle. This result implies there is a nontrivial performance bound on which oracles are useful. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. 1 Collaborative Kuhn Poker To study extensions of CFR, we propose Collaborative Kuhn Poker (CKP), a 4-player extension of Kuhn Poker (Kuhn [1950]). CKP uses a deck of 6 cards: Queen, King and Ace, of Hearts or Spades. There are 4 players i ∈ [North, West, South and East] and 2 teams: North-South and East-West. Each game round, the players are given N = 3 chips and a private card s; the private cards are sampled without replacement from the deck. At the start of a round, each player places one chip in the pot. A round then proceeds in turns; each turn t, players take an action at: betting a chip, raising the bet by 1, calling the bet, or folding. If any player folds, their partner automatically folds and the other team calls the outstanding bet. Once a bet is called or if neither team folds, the team with the strongest poker hand (neglecting flushes) wins the pot, which is shared equally within the team. 2 Generalizing Counterfactural Regret Minimization CFR Partial-information games can be formalized by infosets I t (all information known to player i at time t) and a strategy profile, which encodes player behavior as a map σ : I 7→ P (a|I) from infosets to distributions over the possible actions a. To learn the optimal strategy σ∗, CFR algorithms minimize the counterfactual regret Ri(I, a;σ), which is defined as:
منابع مشابه
Regret Minimization in Games with Incomplete Information
Extensive games are a powerful model of multiagent decision-making scenarioswith incomplete information. Finding a Nash equilibrium for very large instancesof these games has received a great deal of recent attention. In this paper, wedescribe a new technique for solving large games based on regret minimization.In particular, we introduce the notion of counterfactual regret, whi...
متن کاملUsing counterfactual regret minimization to create competitive multiplayer poker agents
Games are used to evaluate and advance Multiagent and Artificial Intelligence techniques. Most of these games are deterministic with perfect information (e.g. Chess and Checkers). A deterministic game has no chance element and in a perfect information game, all information is visible to all players. However, many real-world scenarios with competing agents are stochastic (non-deterministic) with...
متن کاملRegret Minimization in Games with Incomplete Information
Extensive games are a powerful model of multiagent decision-making scenarioswith incomplete information. Finding a Nash equilibrium for very large instancesof these games has received a great deal of recent attention. In this paper, wedescribe a new technique for solving large games based on regret minimization.In particular, we introduce the notion of counterfactual regret, whi...
متن کاملStrategy-Based Warm Starting for Regret Minimization in Games
Counterfactual Regret Minimization (CFR) is a popular iterative algorithm for approximating Nash equilibria in imperfect-information multi-step two-player zero-sum games. We introduce the first general, principled method for warm starting CFR. Our approach requires only a strategy for each player, and accomplishes the warm start at the cost of a single traversal of the game tree. The method pro...
متن کاملInference-based Decision Making in Games
Background: Reinforcement learning in complex games has traditionally been the domain of valueor policy iteration algorithms, resulting from their effectiveness in planning in Markov decision processes, before algorithms based on regret minimization guarantees such as upper confidence bounds applied to trees (UCT) and counterfactual regret minimization were developed and proved to be very succe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017